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Abstract. The analysis of the solving complexity of random 3-SAT instances using 
the Davis-Putnam-Loveland-Logemann (DPLL) algorithm slightly below threshold 
is presented. While finding a solution for such instances demands exponential effort 
with high probability, we show that an exponentially small fraction of resolutions 
require a computation scaling linearly in the size of the instance only. We compute 
analytically this exponentially small probability of easy resolutions from a large 
deviation analysis of DPLL with the Generalized Unit Clause search heuristic, and 
show that the corresponding exponent is smaller (in absolute value) than the growth 
exponent of the typical resolution time. Our study therefore gives some quantitative 
basis to heuristic restart solving procedures, and suggests a natural cut-off cost (the 
size of the instance) for the restart. 
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1. Introduction. 

Being a NP-complete problem, 3-SAT is not thought to be solvable 
in an efficient way, i.e. in time growing at most polynomially with 
N. In practice, one therefore resorts to methods that need, a priori, 
exponentially large computational resources. One of these algorithms 
is the ubiquitous Davis-Putnam-Loveland-Logemann (DPLL) solving 
procedure (Davis, Logemann and Loveland, 1962; Gu, Purdom, Franco 
and Wah, 1997). DPLL is a complete search algorithm based on back- 
tracking. The sequence of assignments of variables made by DPLL in 
the course of instance solving can be represented as a search tree, 
whose size Q (number of nodes) is a convenient measure of the instance 
hardness. Some examples of search trees are presented in Figure 1. 

In the past few years, many experimental and theoretical progresses 
have been made on the probabilistic analysis of 3-SAT(Hogg, Huber- 
man and Williams, 1996; Gent, van Maaren and Walsh, 2000). Distri- 
butions of random instances controlled by few parameters are particu- 
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larly useful in shedding light on the onset of complexity. An example 
that has attracted a lot of attention is random 3-SAT: the three lit- 
erals in a clause are randomly chosen variables, or their negations 
with equal probabilities, among a set of N Boolean variables; clauses 
are drawn independently of each other. Experiments (Hogg, Huberman 
and Williams, 1996; Mitchell, Selman and Levesque, 1992; Crawford 
and Auton, 1996; Kirkpatrick and Selman, 1994) and theory (Friedgut, 
1999; Dubois et al., 2001) indicate that clauses can almost surely always 
(respectively never) be simultaneously satisfied if a is smaller (resp. 
larger) than a critical threshold ac — 4.3 as soon as M, N go to infinity 
at fixed ratio a. This phase transition (Monasson et al., 1999) is accom- 
panied by a drastic peak in hardness at threshold (Hogg, Huberman 
and Williams, 1996; Mitchell, Selman and Levesque, 1992; Crawford 
and Auton, 1996). The emerging pattern of complexity is as follows. At 
small ratios a < (Xl, where ai depends on the heuristic used by DPLL, 
instances are a.s. satisfiable (sat). Finding a solution requires a tree 
whose size Q scales only linearly with the size N, and almost no back- 
tracking is present (Figure 1A). Above the critical ratio, instances are 
a.s. unsatisfiable (unsat) and proofs of refutation are obtained through 
massive backtracking, leading to an exponential hardness Q = 2 Nw 
with oj > 0(Chvatal and Szmeredi, 1988; Beame et al., 1998). 

Recently, a quantitative understanding of the pattern of complexity 
was proposed to estimate oj in the unsat regime as a function of the 
ratio a of clauses per variables of the 3-SAT instance to be solved, 
and to unveil the structure of DPLL's search tree in the upper sat 
phase (Figure IB), i.e. for ratios ai < a < ac (Cocco and Monasson, 
2001). In the latter range, instances are a.s. sat, but their resolution 
requires with high probability an exponentially large computational 
effort {uj > 0) (Cocco and Monasson, 2001; Coarfa et al., 2000; San 
Miguel Aguirre et al., 2001; Achlioptas, Beame and Molloy, 2001c). 
Theoretical predictions for w as a function of a were derived (Cocco 
and Monasson, 2001), extending to the upper sat phase the calculations 
of the unsat phase. 

In this paper, we study in more detail the upper sat phase, and more 
precisely the distribution of resolutions complexities of randomly drawn 
instances with ratios ai < a < ac- Using numerical experiments and 
analytical calculations, we show that, though complexity Q a.s. grows 
as 2 Nw , there is a finite but exponentially small probability 2~ N< > that 
Q is bounded from above by N only. In other words, while finding 
solutions to these sat instances is almost always exponentially hard, it 
is very rarely easy (polynomial time) . Taking advantage of the fact that 
( is smaller than oj, we show how systematic restarts of the heuristic 
may decrease substantially the overall search cost. Our study therefore 
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Figure 1. Types of search trees generated by the DPLL solving procedure on satisfi- 
able instances. A. lower sat phase, a < a L : the algorithm finds easily a solution with 
almost no backtracking. B. upper sat phase, q_l < a < ac-' many contradictions 
(c) arise before reaching a solution, and backtracking enters massively into play. 
Junction G is the highest node in the tree reached back by DPLL. D denotes the 
first contradiction detected by DPLL, located at the leaf of the first descent in the 
tree. 



gives some theoretical basis to incomplete restart techniques known to 
be efficient to solve satisfiable instances(Dubois et al., 1993; Gomes et 
al., 2000), and suggests a natural cut-off cost for the restart. 

We first start by recalling our previous framework for studying 
resolutions taking place with high probability, which is helpful to un- 
derstand rare resolutions too (Section II). Numerical experiments are 
presented in Section III. Section IV is devoted to the analytical cal- 
culation of £, and we present some conclusions in Section V. We have 
tried to highlight the different status of the statements and results pre- 
sented: rigorous, expected to be exact but proof lacking, approximate, 
or experimental. We hope this effort will benefit the reader and make 
our work more accessible. 
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2. Resolution trajectories: the high probability scenario. 



In this section, we briefly recall the main features of the resolution by 
DPLL of satisfiable instances of size N, occurring with large probability 
as N — > oo(Chao and Franco, 1986; Chao and Franco, 1990; Achlioptas, 
2001b; Cocco and Monasson, 2001). 

The action of DPLL on an instance of 3-SAT causes the changes 
of the overall numbers of variables and clauses, and thus of the ratio 
a. Furthermore, DPLL reduces some 3-clauses to 2-clauses. We use a 
mixed 2+p-SAT distribution(Monasson et al., 1999), where p is the 
fraction of 3-clauses, to model what remains of the input instance at a 
node of the search tree. Using experiments and methods from statistical 
mechanics (Monasson et al., 1999), the threshold line ctc(p), separating 
sat from unsat phases, may be estimated with the results shown in 
Figure 2. For p < pr = 2/5, i.e. left to point T, the threshold line 
is given by otc(p) = 1/(1 — p), and saturates the upper bound for 
the satisfaction of 2-clauses (Monasson et al., 1999; Achlioptas, 2001b). 
Above pt, no exact expression for ac(p) is known. 

The phase diagram of 2+p-SAT is the natural space in which the 
DPLL dynamic takes place. An input 3-SAT instance with ratio a 
shows up on the right vertical boundary of Figure 2 as a point of 
coordinates (p = I, a). Under the action of DPLL, the representative 
point moves aside from the 3-SAT axis and follows a trajectory which 
depends on the splitting heuristic implemented in DPLL. We consider 
here the so-called Generalized Unit-Clause (GUC) heuristic proposed 
by Franco and Chao(1986; 1990) (Franco, 2001; Achlioptas, 2001b). 
Literals are picked up randomly among one of the shortest available 
clauses. This heuristic does not induce any bias nor correlation in the 
instances distribution (Chao and Franco, 1986). Such a statistical "in- 
variance" is required to ensure that the dynamical evolution generated 
by DPLL remains confined to the phase diagram of Figure 2. 

Chao and Franco were able to analyze rigorously resolutions corre- 
sponding to initial ratios a < ul — 3.003. Their analysis consists in 
monitoring the evolution of the densities (numbers divided by N) C2 
and C3 of 2- and 3-clauses respectively as more and more variables are 
assigned by DPLL. Both densities become highly concentrated around 
the averages as the size N goes to infinity. Calling t the fraction of 
assigned variables, C2(t) and c^{t) obeys a set of coupled ordinary 
differential equations (ODE), 
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where P2(t),ps(t) are the probabilities that the split is made from a 2-, 
3-clause respectively. For GUC and an initial ratio ao > 2/3, p2(t) = 
l-c 2 (t)/(l-t),p 3 (t)=0. 

To obtain the single branch trajectory in the phase diagram of Fig- 
ure 2, we solve the ODEs (1) with initial conditions C2(0) = 0, 03(0) = 
«o, and perform the change of variables 

to obtain 

"(*) = Y^ 1- *) + ~ + ln(l-*) . PW = — ^ • ( 3 ) 

Results are shown for the GUC heuristics and starting ratios ao = 2 
and 2.8 in Figure 2. The trajectory, indicated with a dashed line, first 
heads to the left and then reverses to the right until reaching a point 
on the 3-SAT axis at a small ratio. Further action of DPLL leads to a 
rapid elimination of the remaining clauses and the trajectory ends up 
at the right lower corner S, where a solution is found. Note that for 
initial ratios ao < 2/3, only the second part of the trajectory restricted 
to the p = 1 axis subsists. 

Frieze and Suen (1996) have shown that, for ratios ao < «l — 3.003 
(for the GUC heuristics), the full search tree essentially reduces to a 
single branch, and is thus entirely described by the ODEs (1). The 
amount of backtracking necessary to reach a solution is bounded from 
above by a power of logiV. The average size of the branch Q scales 
linearly with N with a multiplicative factor 7(00) = Q/N that can be 
computed exactly(Cocco and Monasson, 2001). 

The boundary cti of this easy sat region can be defined as the largest 
initial ratio ao such that the branch trajectory p(t),a(t) issued from 
ao never leaves the sat phase in the course of DPLL resolution. Indeed, 
as «o increases up to oil, the trajectory gets closer and closer to the 
threshold line ctc(p)- Finally, at ~ 3.003, the trajectory touches the 
threshold curve tangentially at point T. 

The concept of trajectory helps to understand how resolution takes 
place in the upper sat phase, that is for ratios ao ranging from a^ to ac- 
The branch trajectory, started from the point (p = 1, ao) corresponding 
to the initial 3-SAT instance, hits the critical line a c (p) at some point 
G with coordinates {pGi a c) after N to variables have been assigned 
by DPLL, see Figure 2. The algorithm then enters the unsat phase 



pap.tex; 1/02/2008; 21:13; p. 5 



6 



S. Cocco, R. Monasson 



and generates 2+p-SAT instances with no solution. Backtracking will 
appear as soon as a contradiction is detected by DPLL. This may occur 
at any point along the trajectory (Frieze and Suen, 1996), but no further 
than the crossing point D with the a = 1/(1 —p) line (beyond D, unit- 
clauses are created at a rate larger than their elimination through unit- 
propagation, and opposite literals will appear w.h.p.). Later, massive 
backtracking enters into play (Cocco and Monasson, 2001) until G is 
reached back by DPLL. G is indeed the highest backtracking node in 
the tree, since nodes above G are located in the sat phase and carry 
2+p-SAT instances with solutions (Figure IB). DPLL will eventually 
reach a solution S (Figure IB). 



3. Numerical experiments. 

In this section we present some numerical experiments on large but 
finite instance sizes, showing some deviations from the high probability 
scenario exposed above. 

3.1. Instance-to-instance distribution of complexities. 

We have first performed some experiments to understand the distribu- 
tion of instance-to-instance fluctuations of the solving times (Hogg and 
Williams, 1994; Selman and Kirkpatrick, 1996; Gent and Walsh, 1994). 
We draw randomly a large number of instances at fixed ratio a = 3.5 
and size N and, for each of them, run DPLL until a solution is found 
(a small number of unsat instances can be present and are discarded) . 
We show in Figure 3 the normalized histogram of the logarithms u of 
the corresponding complexities Q = 2 Nw . The histogram is made of a 
narrow peak (left side) followed by a wider bump (right side). As N 
grows, the right peak acquires more and more weight, while the left 
peak progressively disappears. The abscissa of the center of the right 
peak gets slightly shifted to the left, but seems to reach a finite value 
uj* ~ 0.035 as N — > oo(Cocco and Monasson, 2001). This right peaks 
thus corresponds to the core of exponentially hard resolutions: w.h.p. 
resolutions of instances require a time scaling as 2 Nuj * as the size of 
the instance gets larger and larger, in agreement with the discussion of 
Section II. 
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Figure 2. Phase diagram of 2+p-SAT and dynamical trajectories of DPLL for sat- 
isfiable instances. The threshold line ac(p) (bold full line) separates sat (lower part 
of the plane) from unsat (upper part) phases. Extremities lie on the vertical 2-SAT 
(left) and 3-SAT (right) axis at coordinates (p = 0, ac = 1) and (p = 1, ac m 4.3) re- 
spectively. Departure points for DPLL trajectories are located on the 3-SAT vertical 
axis (empty circles) and the corresponding values of a are explicitly given. Arrows 
indicate the direction of "motion" along trajectories (dashed curves) parametrized by 
the fraction t of variables set by DPLL. For small ratios a < ai, branch trajectories 
remain confined in the sat phase, end in S of coordinates (1,0), where a solution is 
found. At ai (~ 3.003 for the GUC heuristic, see text), the single branch trajectory 
hits tangentially the threshold line in T of coordinates (2/5,5/3). In the range 
ctL < a < ac, the branch trajectory intersects the threshold line at some point 
G (that depends on a). With high probability, a contradiction arises before the 
trajectory crosses the dotted curve a — 1/(1 — p) (point D); through extensive 
backtracking, DPLL later reachs back the highest backtracking node in the search 
tree (G) and find a solution at the end of a new descending branch, see Figure IB. 
With exponentially small probability, the trajectory (dot-dashed curve, full arrow) 
is able to cross the "dangerous" region where contradictions are likely to occur; it 
then exits from this contradictory region (point D') and ends up with a solution 
(lowest dashed curve, light arrow). 



pap.tex; 1/02/2008; 21:13; p. 7 



8 



S. Cocco, R. Monasson 




Figure 3. Probability distribution of the logarithm lu of the complexity (base 2, and 
divided by N) for a = 3.5 and for different sizes N. Histograms are normalized to 
unity and obtained from 400,000 (N = 100), 50,000 (N = 200), 20,000 (N = 300), 
and 5,000 (N = 400) samples 



On the contrary, the location of the maximum of the left peak seems 
to vanish as log 2 (N)/N when the size N increases, indicating that the 
left peak accounts for polynomial (linear) resolutions. We have thus 
rep lotted the data shown in Figure 3, changing the scale of the hori- 
zontal axis uj = log 2 (Q)/iV into Q/N. Results are shown in Figure 4. 
We have limited ourself to Q/N < 1, the range of interest to analyse 
the left peak of Figure 3. The maximum of the distribution is located 
at Q/N ~ 0.2 — 0.25, with weak dependence upon N . The cumulative 
probability Pu n to have a complexity Q less than, or equal to N, i.e. 
the integral of Figure 4 over < Q/N < 1, decreases very quickly 
with N. We find an exponential decrease, Pn n = 2~ N ^, see Inset of 
Figure 4. The rate ( ~ 0.011 ± 0.001 is determined from the slope of 
the logarithm of the probability shown in the Inset. 
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Figure 4- Probability distributions of the complexity Q (divided by the size N) for 
sizes N = 100 (full line), N = 200 (dashed line), N = 300 (dotted line), N = 400 
(dashed-dotted line). Distributions are not shown for complexities larger than N. 
Inset: Minus logarithm of the cumulative probability of complexities smaller or equal 
to N as a function of N, for sizes ranging from 100 to 400 (full line); logarithm of 
the number of restarts necessary to find a solution for sizes ranging from 100 to 1000 
(dotted line). Slopes are equal to f = 0.0011 and f = 0.00115 respectively. 



3.2. LOCUS OF HIGHEST BACKTRACKING POINTS. 

To gain some intuition on the origin of fast, linear resolutions, we have 
looked for the locus of the highest backtracking nodes G in the search 
trees. In the infinite size limit, G is located w.h.p. at the crossing G* of 
the resolution trajectory and the critical sat/unsat line (Section II). In 
Figure 5 we show numerical evidence for the link between complexity 
and trajectories in the phase diagram for finite instance sizes. We have 
run 20,000 instances (a = 3.5, N = 300), and reported for each of 
them the coordinates pg,ug of the highest backtracking point, and 
the logarithm u of the corresponding complexity. Large complexities 
[to > 0.3, right bump of Figure 3) are associated to points G forming 
a cloud centered around G* in the phase diagram of the 2+p-SAT 
model, while points G related to small complexities (u < 0.2, left peak 
of Figure 3) are much more scattered in the phase diagram. Notice 
the strong correlation between the value of to and the average location 
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of G along the branch trajectory of Section II. In the following we 
will concentrate on linear resolutions only. A complementary analysis 
of the distribution of exponential resolutions for the problem of the 
vertex covering of random graphs was recently done by Montanari and 
Zecchina (2002). 

Figure 5 shows that easy resolutions correspond to trajectories capa- 
ble of trespassing the contradiction line a = 1/(1 —p). This, in addition 
to the linear scaling of the corresponding complexities, indicates that 
easy resolutions coincide with first descents in the search tree ending 
with a contradiction located far beyond D in the phase diagram, and 
then requiring a very limited amount of backtracking before a solution 
is found. 

This statement is supported by the analysis of the number of unit- 
clause generated during easy resolutions. We have measured the maxi- 
mal number (Ci) max of unit-clauses generated along the last branch 
in the tree, leading to the solution S (Figure IB). We found that 
{Cijmax scales linearly with N with an extrapolated ratio (Ci) max /N ~ 
0.022 for a = 3.5. This linear scaling of the number of unit-clauses is 
an additional proof of the trajectory entering the "dangerous" region 
a > 1/(1 — p) of the phase diagram where unit-clauses accumulate. In 
presence of a 0(N) number of 1-clauses, the probability of survival of 
the branch (absence of contradictory literals among the unit-clauses) 
will be exponentially small in N, in agreement with scaling of the left 
peak weight in Figure 3. 



3.3. Run-to- run fluctuations and restart experiments. 

We have so far considered the instance-to-instance fluctuations of the 
complexity, that is the distribution of complexity obtained from one 
run of DPLL on each of a large number of instances. In Figure 6, we 
now show the histogram of complexities for a large number of runs 
on a unique, random instance. After each run, clauses and variables 
are randomly relabeled to avoid any correlation between different runs. 
Figure 6 shows that these run-to-run distributions are qualitatively 
independent of the particular instances, and similar to the instance-to- 
instance distribution of Figure 6(Hogg and Williams, 1994; Selman and 
Kirkpatrick, 1996). 

The similarity between run-to-run and instance-to-instance fluctu- 
ations for large sizes speaks up for the use of a systematic stop-and- 
restart heuristic to speed up resolution: if a solution is not found before 
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0.2 0.4 0.6 0.8 1 

fraction of 3-clauses p 

Figure 5. Locus of highest backtracking points G in the phase diagram of the 
2+p-SAT model for 20,000 instances with N = 300. The bold gray line represent the 
first branch trajectory for a — 3.5. Colors reflect the complexities of the instances, 
whose logarithms uj range from 0.01 to 0.09, and are divided into 8 intervals of 
width Au = 0.01 and increasing darkness. Filled triangles are the center of masses 
of points G for each of the 8 intervals (the larger u, the closer to the p = 1 axis). 



N splits, DPLL is stopped and launched again after some random per- 
mutations of the variables and clauses. Intuitively, the expected number 
of restarts necessary to find a solution should indeed be equal to the 
inverse of the weight of the linear complexity peak in Figure 3, with a 
resulting total complexity scaling as N 2 0,011 N , and much smaller than 
the one-run complexity 2 0,035 N of DPLL (Section II). 

We check the above reasoning by measuring the number N res t of 
restarts performed before a solution is finally reached with the stop- 
and-restart heuristic, and averaging \og 2 (N res ) over a large number of 
random instances. Results are reports in the Inset of Figure 4. The 
typical number N res t = 2 N( > of required restarts clearly grows exponen- 
tially as a function of the size N with a rate £ = 0.0115 ± 0.001. To the 
accuracy of the experiments, £ and £ coincide as expected. 
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Logarithm of complexity (divided by size N) 

Figure 6. Probability distributions of the logarithm to of the resolution complexity 
from 20,000 runs of DPLL. Each one of the five distribution corresponds to one 
randomly drawn instance of size N = 300. The black curve is the instance-to-instance 
fluctuations of the complexity shown on Figure 3. 

4. Large deviation analysis of the first descent in the tree. 

4.1. Evolution equation for the instance. 

Hereafter we compute the probability P(Ci, C2, C3; T) that the first 
branch of the tree carries an instance with Cj j-clauses (j = 1, 2, 3) after 
T variables have been assigned (and no contradiction has occurred). Let 
us call C the vector (Ci, C2, C3). P obeys a Markovian evolution 

P(C;r + l)=^^(C,C / ;r)P(C';T) , (4) 
c 

where the entries of the transition matrix K are equal to (for the GUC 
heuristic), 
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(5) 



and 5c denotes the Kronecker delta function: 5c = 1 if C = 0, 5c = 
otherwise. M = iV — T is the number of unassigned variables. 

We then define the generating function P( y ; T ) of the probabilities 
P(C;T) where y = (3/1,2/2,2/3), through 



P(y;T) = £ P(C,T) 

c 



(6) 



where • denotes the scalar product. Evolution equation (4) can be 
rewritten in term of the generating function P, 

P(y;T+l) = e^i(y) P(g(y);T) + 

( e -*.(y)_ e -»i(y)) P( - oo, 52 (y), 53 (y) ;T) (7) 
where g is a vectorial function of argument y whose components read 

1 / e^ 1 



91 (y) = 2/1 + in 

52 (y) = 2/2 + in 

53 (y) = 2/3 + In 



1 + 
1 + 
1 + 



N -T 
2 

N-T 
3 

iV-T 



2 

' e ~V2 



(1 + e^ 1 )-! 



(1 + e^)-! 



(8) 



We now solve equation (7) by making some hypothesis on the scaling 
behavior of P for large sizes. 

4.2. Hypothesis for the large N scaling of the 

PROBABILITY. 

Calculations leading to equation (7) are rigorous. We shall now make 
some hypothesis on P, P that we believe to be correct in the large size 
TV limit, but without providing rigorous proofs for their validity. Our 
approach, common in statistical mechanics, may be seen as a practical 
way to establish conjectures. 
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First, each time DPLL assigns variables through splitting or unit- 
propagation, the numbers Cj of clauses of length j undergo 0(1) changes 
It is thus sensible to assume that after a number T = t N of variables 
have been assigned, the densities Cj = Cj/N of clauses have been mod- 
ified by O(l). This translates into a scaling Ansatz for the probability 
P, 

p(C;T) = e N ^ Cl ' C2 > C3 ' t) {<f<0) (9) 



up to non exponential in N corrections. From equations (6) and (9), 
we obtain the following scaling hypothesis for the generating function 
P, 

P(y;T) =e N ^ iy ' t) (10) 



up to non exponential in TV" terms. Notice that ip and if are simply 
related to each other through Legendre transform, 



max 

c 



<p(c:t) = min 
y 



£(c;t)+y c 
<p(y,t) - y • c 



(11) 
(12) 



In particular, f{y = 0; t) is the logarithm of the probability (divided 
by N) that the first branch has not been hit by any contradiction after 
a fraction t of variables have been assigned. The most probable values 
of the densities Cj(t) of j-clauses are then obtained from the partial 
derivatives of ip(y ; t ) in y = 0: Cj(t) = dip/dyj(y = 0). 

We now present the partial differential equations (PDE) obeyed by 
ip. Two cases must be distinguished: the number C\ of unit-clauses may 
be bounded (C\ = 0(1), c\ = o(l)), or of the order of the instance size 

(Ci = e(jv),ci = e(i)). 



4.3. Case C\ = O(l): a large deviation analysis around 
Chao and Franco's result. 



When DPLL starts running on a 3-SAT instance, very few unit-clauses 
are generated and splittings occur frequently. In other words, the prob- 
ability that C\ = is strictly positive when N gets large. Consequently, 
both terms on the r.h.s. of (7) are of the same order, and we make the 
hypothesis that ip does not depend on y\\ f(yi,y2,y3]t) = <p(y2,y3',t). 
This hypothesis simply expresses that c\ = dip/dyi identically vanishes. 
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Inserting expression (10) into the evolution equation (7), we find 1 

dp dp 

= -2/2 + 27(2/2,2/2;*) tt--(2/2,2/3;*) 

+ 37(2/2, t/ 3 ; i) 77— (^2, 2/3; *) (13) 
C2/3 

where function 7 is defined through, 

7 (^;i) = ^(^(l + 0-l) . (14) 

PDE (13) together with initial condition (p(y; t = 0) = ao 2/3 (where «o 
is the ratio of clauses per variable of the 3-SAT instance) can be solved 
exactly with the resulting expression, 

<p{V2,y 3 ;t) = a ln[l + (l-t) 3 (en-le**-^ + fc^( e «« - 1) 

+ (1-t) 2/2 z y2 + (1 - 0(e !/2 - 1) hi(l - t) 

- (e y2 +t-te m ) In (e y2 +t-te y2 ) (15) 

Chao and Franco's analysis of the GUC heuristic may be recovered 
when 2/2 = 2/3 = as expected. It is very easy to check that 92(2/2 = 
0, 2/3 = 0; t) = (the probability of survival of the branch is not expo- 
nentially small in iV(Frieze and Suen, 1996)), and that the derivatives 
C2(t),cs(t) of (^(2/2, 2/3! t) with respect to 2/2 and 2/3 coincide with the 
solutions of (1). In addition, our calculation provides also a complete 
description of rare deviations of the resolution trajectory from its highly 
probable locus shown on Figure 2. As a simple numerical example, 
consider DPLL acting on a 3-SAT instance of ratio «o = 3.5. Chao 
and Franco's analysis shows that, once e.g. t = 20% of variables have 
been assigned, the densities of 2- and 3-clauses are w.h.p. equal to 
C2 — 0.577 and C3 ~ 1.792 respectively. Expression (15) gives access 
to the exponentially small probabilities that C2 and C3 differ from their 
most probable values. For instance, choosing 2/2 = —0.1,2/3 = 0.05, 
we find from (15) and (12) that there is a probability e -°- 00567Ar that 
C2 = 0.504 and C3 = 1.873 for the same fraction t = 0.2 of eliminated 



1 PDE (13) is correct in the major part of the 2/1,2/2,3/3 space and, in particular, 
in the vicinity of y = we focus on in this paper. It has however to be to modified in 
a small region of the 2/1,2/2,2/3 space; a complete analysis of this case is not reported 
here but may be easily reconstructed along the lines of Appendix A in (Cocco and 
Monasson, 2001b). 
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variables. By scanning all the values of 2/2,2/3 we can obtain a complete 
description of large deviations from Chao and Franco's result 2 . 

The assumption C\ = 0(1) breaks down for the most probable 
trajectory at some fraction to e.g. tjj ~ 0.308 for qo = 3.5 at which the 
trajectory hits point D on Figure 2. Beyond D, 1-clauses accumulate 
and the probability of survival of the first branch is exponentially small 
in N. 



4.4. Case C\ = O(N): passing through the "dangerous" 

REGION. 

When the number of unit-clauses becomes of the order of N, variables 
are a.s. assigned through unit-propagation. The first term on the r.h.s. 
of equation (7) is now exponentially dominant with respect to the 
second one. The density of 1-clauses is strictly positive, and ip depends 
on y\. We then obtain the following PDE, 

dip dip 
■^-(2/1,2/2,2/3;*) = -2/1 +7(-°°, 2/i;*) 7^(2/1,2/2,2/3;*) 

+ 2 7(2/1, y 2 ;t) — (1/1,2/2,2/3;*) 

+ 37(7/2,2/3;*) 7^— (2/1,2/2,2/3;*) (16) 

with 7(14, v; t) given by equation (14). When 2/1 = 2/2 = 2/3 = 0, equation 
(16) simplifies to 

dt [t) 2(1 -t) ' [ ' 

where ci(i) is the most probable value of the density of unit-clauses, and 
z(t) is the logarithm of the probability that the branch has not encoun- 
tered any contradiction (divided by N). The interpretation of (17) is 
transparent. Each time a literal is assigned through unit-propagation, 
there is a probability (1 - 1/2(N - T))^" 1 ~ e -cl / 2 /( 1- *) that no 
contradiction occurs. The r.h.s. of (17) thus corresponds to the rate of 
decay of z with "time" t. 

We have not been able to solve analytically PDE (16), and have re- 
sorted to an expansion of ip in powers of y. To k th order, we approximate 
the solution of (16) by a polynomial of total degree k, 

<pW(y,t)= ]T ^.^(t) (18) 

ei+e2+e3<fc 



2 Though we are not concerned here with subexponential (in N) corrections to 
probabilities, let us mention that it is also possible to calculate the probability of 
split (Ci = 0) per unit of time, extending Frieze and Suen's result (1996) to y / 0. 
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Table I. Results at different orders k of approximation for 
Qo = 3.5: logarithm £ of the probability that the first branch 
is not hit by any contradiction, maximal density (ci) max of 
unit-clauses ever reached, fraction of eliminated variables t D i 
and coordinates Pd',(Xd> at point D' i.e. when the number 
of unit-clauses ceases to be O(N), complexity ratio 7 = Q/N 
of the corresponding linear resolution. 



order 


c 


(Cl ) max 


t D > 


Pd> 




7 


1 


.0384 


.0502 


.8878 


.0804 


.5477 


.1720 


2 


.0036 


.0121 


.6553 


.2707 


1.575 


.1990 


3 


.0098 


.0227 


.7495 


.1901 


1.201 


.2069 


4 


.0098 


.0226 


.7483 


.1911 


1.206 


.2069 



and insert (18) on the r.h.s. of (16). We collect on the l.h.s. the terms of 
degrees < k and obtain a set of Mk = (fe + 3)(fe + 2)(fe + l)/6 first order 

coupled linear ODEs for the coefficients (/ 7 ei^e2,e 3 (t) of the polynomial 
(18). This approximation gets better and better as k increases at a cost 
of more and more coupled ODEs to be solved. The initial conditions 
for these ODEs are chosen to match the expansion of the exact solution 
(15) at time to- 

At the lowest order (k = 1), we find a set of four coupled equations 

for *(!>(*) EE ^l (t),C?(t) EE rfftoW.^W = ^oioW'^Ht) = 

Vchd i(*) t na t rea d 

dc?\t) = c<P(t) cjPtt) 
dt 2(1 - 1) l-t 

dt l-t 2(1 -t) 

dc { i\t) = 3 4 1} 

dt l-t 
dzW{t) c { i\t) 



dt 2(1 - t) 



(19) 



together with the initial conditions c[\to) = z^(to) = 0, c^^d) = 

1 — tojC^^o) = ao(l — *r>) 3 ) with to uniquely determined from o>q. 
The solution of (19) for qo = 3.5 shows that c\ first increases and 
reaches its top value (c^) max ~ 0.05. It then decreases and vanishes 
at t^jy ~ 0.89, where the trajectory exits the "dangerous" region where 
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Figure 7. Exponent £ of the linear resolution probability (simulations: filled squares, 
theory: full line), and exponent u> of the typical complexity (simulations: empty 
circles, theory from (Cocco, Monasson 2001b): dotted line), as a function of the 
clause per variable ratio a. 



contradictions occurs w.h.p. (Figure 2). The probability of this event 

scales as 2^ (1) for large N, with C (1) = -z (1) (4}')/ m2 - °- 038 - The 
end of the resolution trajectory obeys Chao and Franco's equations (1). 

Results improve when going to higher orders in k, see Table I. No 
sensible difference can be found between k = 3 and k = 4 results. The 
calculated values of £ — 0.01, {c\) max ~ 0.022 and 7 ~ 0.21 are in very 
good agreement with the numerical experiments of Section III. 

We report on Figure 7 the experimental and theoretical values of ( 
found over the whole range ol < «o < ac- Note the very good agree- 
ment between our quantitative theory and simulations, which supports 
the scaling hypothesis made above. 
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5. Conclusions. 

In this work, we have studied deviations from the typical (i.e. occurring 
w.h.p.) solving complexity of satisfiable random 3-SAT instances using 
DPLL algorithm with a simple splitting heuristic (GUC) (Chao and 
Franco, 1986; Chao and Franco, 1990). For ratios a of clauses per 
variable in the range = 3.003 < a < etc, complexity grows almost 
surely exponentially with the size N of the instance, but resolution 
may very rarely (i.e. with an exponentially small probability) require a 
polynomial (linear) computational effort only. These linear resolutions 
correspond to search tree reducing to a single branch essentially (Fig- 
ure 1A), and can be visualized as trajectories that cross the unsat phase 
of the Figure 2 diagram without being stopped by any contradiction. 
Our approach allowed us to calculate the large deviations from typical 
resolutions, and the exponent £ of the probability P ~ 2"^ of linear 
resolutions. Our theoretical calculation predicts for instance that the 
exponent corresponding to random 3-SAT instances with ratio a = 3.5 
equals £ ~ 0.01, in very good agreement with the values extrapolated 
from the histogram of resolution time on different instances (instance- 
to-instance distribution of complexities) and the value ( extrapolated 
from the number of restarts necessary to solve one random instance 
(Inset of Figure 4) . 

The computational effort to find a solution with the systematic 
restart procedure, N res t ~ 1/Pu n ~ 2^ turns out to be exponentially 
smaller than the typical time to find a solution 2 Nw without restart 
(e.g. u = 0.035 for a = 3.5). Our calculation gives thus some theoretical 
support to the use of restart-like procedures (see also (Montanari and 
Zecchina, 2002) for recent theoretical results), empirically known to 
speed up considerably resolutions(Dubois et al., 1993; Gomes et al., 
2000). To be more concrete, while, without restarts, we were able to 
solve with DPLL algorithm instances with 500 variables in about one 
day of CPU (for a = 3.5), the restart procedure allows to solve instances 
with 1000 variables in 15 minutes with the same computer and splitting 
heuristic (GUC). 

The present work suggests that the cut-off time, at which the search 
is halted and restarted, need not be precisely tuned but is simply given 
by the size of the instance. This conclusion could be generic and apply 
to other combinatorial decision problems and other heuristics. More 
precisely, if a combinatorial problem admits some efficient (polynomial) 
search heuristic for some values of control parameter (e.g. the ratio 
a here, or the average adjacency degree for the coloring problem of 
random graphs), there might be an exponentially small probability that 
the heuristic is still successful (in polynomial time) in the range of pa- 
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rameters where resolution almost surely requires massive backtracking 
and exponential effort. When the decay rate of the polynomial time 
resolution probability £ is smaller than the growth rate u> of the typical 
exponential resolution time, stop-and-restart procedures with a cut-off 
in the search equal to a polynomial of the instance size will lead to an 
exponential speed up of resolutions. 

It would be interesting to extend the previous approach to more 
sophisticated and powerful search e.g. satz of chaff heuristics. It is 
however not clear how a full analytical study could be worked out 
without resorting to approximate expressions for the transition matrix. 
Another natural extension of the present work would be to focus on 
other decisions problems e.g. graph coloring for which the high proba- 
bility behaviour of simple heuristics is well understood (Achlioptas and 
Molloy, 1997). 
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